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Abstract 

We present an extensive study of the foreshock and aftershock signatures accompanying peaks of 
book sales. The time series of book sales are derived from the ranking system of Amazon.com. We 
present two independent ways of classifying peaks, one based on the acceleration pattern of sales and 
the other based on the exponent of the relaxation. They are found to be consistent and reveal the 
co-existence of two types of sales peaks: exogenous peaks occur abruptly and are followed by a power 
law relaxation, while endogenous sale peaks occur after a progressively accelerating power law growth 
followed by an approximately symmetrical power law relaxation which is slower than for exogenous 
peaks. We develop a simple epidemic model of buyers connected within a network of acquaintances 
which propagates rumors and opinions on books. The comparison between the predictions of the 
model and the empirical data confirms the validity of the model and suggests in addition that social 
networks have evolved to converge very close to criticality (here in the sense of critical branching 
processes of opinion spreading). We test in details the evidence for a power law distribution of book 
sales and confirm a previous indirect study suggesting that the fraction of books (density distribution) 
P{S) of sales 5 is a power law P{S) ~ 1/5'^+'^ with n ^2. 
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I. INTRODUCTION 



In the context of linear response theory, the fluctuation dissipation theorem provides an ex- 
plicit relationship between microscopic dynamics at equilibrium and the macroscopic response 
that is observed in a dynamic measurement. It relates equilibrium fluctuations to close-to- 
equilibrium observables. In out-of-equilibrium systems, the existence of a relationship between 
the response function to external kicks and spontaneous internal fluctuations is not settled 
In many complex systems, this question amounts to distinguishing between endogeneity and 
exogeneity and is important for understanding the relative effects of self-organization versus 
external impacts. This is difficult in most physical systems because externally imposed pertur- 
bations may lie outside the complex attractor which itself may exhibit bifurcations. Therefore, 
observable perturbations are often misclassified. It is thus interesting to study other systems, 
in which the dividing line between endogenous and exogenous shocks may be clearer in the 
hope that it would lead to insight about complex physical systems. The systems for which the 
endogenous-exogenous ouestion is relevant span, beyond the physiea, scienees, the biological 
to the social sciences 

Here, we study a real- world example of how the response function to external kicks is related 
to internal fluctuations. We study the precursory and recovery signatures accompanying shocks 
in complex networks, that we test on the database of book ranks provided by Amazon.com. 
We find clear distinguishing signatures classifying two types of sales peaks. Exogenous peaks 
occur abruptly and are followed by a power law relaxation, while endogenous sales peaks occur 
after a progressively accelerating power law growth followed by an approximately symmetrical 
power law relaxation which is slower than for exogenous peaks. These results are rationalized 
quantitatively by a simple model of epidemic propagation of interactions with long memory 
within a network of acquaintances. The slow relaxation of sales implies that the sales dynam- 
ics is dominated by cascades rather than by the direct effects of news and advertisements, 
indicating that the social network is close to critical. We perform also a direct measurement 
on ranks that give important constraints on the conversion that transforms sales into ranks. 

n 

A short version of this work is 

The structure of the paper is as follows. We first present our database (Junglescan.com) 
and explain how we use it to estimate the sales from the ranks. To do so, we need to make 
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some assumptions about the mechanisms underlying the sales dynamics. We discuss these 
points on section Ull In section El we revisit this question and derive a way to measure directly 
this conversion. The epidemic branching process used in our study leads to an effective linear 
coarse-grained description of the complex nonlinear dynamics. We present our model on section 
mil and derive some predictions about the behavior of the system before and after sales peaks. 
Then, the data are analyzed in section IIVI We provide two independent classifications of 
exogenous and endogenous shocks. The obtained classifications are robust. The last section 
concludes with suggestions for future research. 

II. CONSTRUCTION OF THE DATA: CONVERTING RANKS INTO SALES 
A. The database 

The study was performed using data from Amazon.com. Amazon was founded in 1994 as an 
online bookseller, and since has expanded its business into areas such as clothing, gourmet food, 
sports equipment and jewelry. With an expected $6 billion net sales in 2004, the American 
giant of internet trading is by far the largest e-retailer. 

Electronic data make it possible to deal with huge amount of information impossible to 
gather otherwise. With the birth of the Internet, the prospect of understanding quantitatively 
social phenomena is emerging P|. It is perhaps, the first time that an organization can collect 
so much informationjl^. What people buy can help give an image of the state of the society. 
And Amazon can provide almost all of what can be bought. It operates on a global scale. 
That is why Amazon is an outstanding means to probe the society. As Andreas S. Weigend, 
chief scientist of Amazon.com (2002-2004) wrote on his webpage: "Amazon.com might be the 
world's largest laboratory to study human behavior and decision making." Unfortunately, for 
obvious competition grounds, Amazon has no incentive to share its data. It jealously keeps its 
sales secret. But, Amazon is not completely unknowable. It provides on its website the rank 
for each of its product based on their passed sales. 

In a first step, Y. Ageon in our group developed an application using "Web- 
Browser control" to capture automatically the ranks of books on Amazon.com at reg- 
ular time intervals. WebBrowser controlllJl allows a user to browse sites on the In- 
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ternet's World Wide Web. The application first opens Amazon's page with the URL 
: |http: / /207. 171 . 185. 16/exec/obidos/ ASIN/XXXXXX| where XXXXXX corresponds to the 
ISBN or ASIN code of the targeted product (book, DVD, CD, Game, etc.). The rank is 
then found on the page using an algorithm which processes character string. In this way, we 
constructed a database of the ranks of tens of books over several months with a sampling 
periodicity of one hour. This allowed us to check the quality of the time series of the ranks of 
thousands of books which have been recorded by JungleScan ( ^http: / / www.junglescan.com I 
over several years. JungleScan scans books on Amazon typically every six hours for those 
updated hourly by Amazon. For all books we have checked, we found identical rank values in 
JungleScan and in our directly constructed database, showing that we could trust the data 
from JungleScan for our study. 

B. Amazon ranking schemes 

In order for this data to be useful, we need to convert the ranks into meaningful "physical" 
units, such as sales or sale rates. The problem is that Amazon.com does not divulge the 
exact formula for this conversion (otherwise, its secretive strategy on its sale figures would be 
useless). In our study, we use the time series of book ranks up to April 2004. Until October 
2004, the following ranking system held, which is the relevant one for our study 

1. The official statement 

Amazon gives some hints about its ranking system. The official statement of Amazon is 
the following : 

What Sales Rank Means 

As an added service for customers, authors, publishers, artists, labels, and 
studios, we show how items in our catalog are selling. This bestseller list is much 
like The New York Times Bestsellers List, except it lists millions of items! The 
lower the number, the higher the sales for that particular title. 

The calculation is based on Amazon.com sales and is updated regularly. The 
top 10,000 best sellers are updated each hour to refiect sales in the preceding 24 
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hours[la|. The next 100,000 ranks are updated daily. The rest of the list is updated 
monthly, based on several different factors. 



2. Different time scales 

Amazon can use different time scales : 

• a short time scale: one day. That is the time scale we are interested in since our purpose 
is to study sales dynamics. 

• a long time scale. For instance, if Harry Potter sales were to crash overnight, its rank 
won't fall so sharply. It means that the whole history of a book can outweigh its instan- 
taneous sales. It is worth understanding when the short time scale is outbalanced. 

3. Evidence that Amazon uses different ranking schemes 

Books with ranks in the range below 100, 000 are re-ranked according to the sales during 
the last 24 hours. For these books, Amazon uses a short time scale to rank them. We will see 
that there are exceptions. For books with sales ranks over 100, 000, Amazon does not explain 
the different factors used to update the ranks. 

Figd (top panel) shows an example in which the rank increased with large fluctuations up 
to 100,000 (the sales were steadily decreasing), then jumped to a few thousands and then 
followed a very smooth slowly increasing trajectory. The absence of fluctuations in the second 
rightmost part of the graph implies that Amazon re-ranked the books using its sales data using 
a long time-scale. FigC] (bottom panel) shows a related effect. First, the sales fell off resulting 
in the rank increasing to 100, 000. At this rank level, Amazon switched to a long time scale 
ranking scheme, leading to a reassessment of the rank in the range of 10, 000. At some later 
time, some sales occurred which led Amazon to switch back to a 24 hours time scale ranking 
scheme, resulting in the rank increasing dramatically to a level around 10^ but slightly below 
(so that the 24 hours ranking system is active). 

To sum up, only books with a low rank (typically less than ten thousands) are ranked at 
a time scale from a few hours to a day and are devoid of the pathological behavior shown in 
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FigO We will thus restrain our study to the books with ranks below 10, 000. 



C. Evidence that the rank-sales relation is close to a power law 

We present the findings of M. Rosenthal 0|. For over six years, he has followed closely 
the sales ranks of his own books as both an author and a publisher. He has also used data 
points from other authors and publishers. Note that his analysis is unauthorized and in no 
way sponsored by Amazon, which keeps the sales-ranks conversion secret. His analysis allows 
us to get an approximate information on how to convert ranks into sales. 



Rank 


Copies/day 


1 


X 


10 


100 


100 


30 


1000 


10 


10,000 


2 (11 copies every 5 days) 


100,000 


1 copy a week 


1000,000 


around 15 total, depends on pub. date 



TABLE I: Estimate of M. Rosenthal |^ on the rank-sale relationship. 



Table I shows the estimates of M. Rosenthal, which are converted into a rank-ordering or 
Zipf plot in Fig. O The power law dependence S{R) ~ with exponent /i = 2.0 ± 0.1, 

as shown by the straight line with slope —0.5 in Fig. [21 translates into a standard power law of 
the complementary cumulative distribution of sales 

Rr^l/S'', with /i = 2.0 ±0.1 . (1) 

The bend for large ranks (small sales) is describing the bulk of the distribution of sales. The 
tail of the distribution of sales, i.e., for large sales, is described by the part of the figure 
corresponding to small ranks for which there does not appear to be a change of regime from 
a power law behavior, except for the fact that sales for the first ranks seem to fiuctuation 
from book to book much more significantly that would be expected from a pure power law 
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behavior. The current blockbuster with rank 1 may sell from hundreds of books per day to as 
much at tens of thousands of books per day. Such variations may perhaps be explained from 
the property that, while the typical fluctuations of the sales of the flrst rank is of the order of 
a few 100% the distribution of the sales of the flrst rank is also a power law distribution of 
the form (QJ with the same exponent, which means that it is not impossible to have very large 
variations of the sales of rank 1, much larger than their typical values. These fluctuations are 
rendered in Fig. 01 in the left part for small ranks. More data would be needed to determine if 
the variations of the sales of the flrst ranks are explained by the power law distribution Jl} or 
may perhaps reveal an ampliflcation mechanism putting blockbusters apart. It is also possible 
that the increased fluctuations for the smallest ranks reflect the effect of the network structure 
of acquaintances. In contrast, the fluctuations of sales from rank 10 are typically of the order 
of 30% in agreement with expectations derived from the power law (QJ). 

The correspondence between ranks and sales suggested in Figures [2l and El and captured by 
the phenomenological formula JH) allows us to convert the time series of ranks for all studied 
books into time series of sales. Notwithstanding the possible uncertainties and errors in the 
calibration of the conversion from ranks to sales, as discussed above, one should consider this 
conversion as basically a convenient way to give a quantitative interpretation to the rank time 
series. 

In summary, we will assume that the pdf p{S) of book sales is a power law of the form 

PiS) = (2) 

with fi = 2. In section El we revisit this assumption and test it further from constraints on 
the averages of rank increments. 

III. DESCRIPTION OF THE MODEL 

A. Motivation: exogenous versus endogenous shocks 

Consider the two sales time series shown in Figjil exemplifying two characteristic patterns. 

• Some books become best-sellers overnight, thanks to rocketing sales. Book A ("Strong 
Women Stay Young" by Dr. M. Nelson) jumped on June 5, 2002 from rank in the 2,000s 
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to rank 6 in less than 12 hours. On June 4, 2002, the New York Times published an 
article crediting the "groundbreaking research done by Dr. Miriam Nelson" and advising 
the female reader, interested in having a youthful postmenopausal body, to buy the 
book and consult it directly Q]. This case is the archetype of what we will refer to as an 
"exogenous" shock. 

• Some books become best-sellers after a long and steady increase in their sales. Book B 
("Heaven and Earth (Three Sisters Trilogy)" by N. Roberts) culminated at the end of 
June 2002 after a slow and continuous growth, with no such newspaper article, followed 
by a similar almost symmetrical decrease, the entire process taking about 4 months. 

B. Epidemic branching process with long-range memory 

Such social epidemic process can be captured by the following simple model ^]. The 
model is based on the idea that the instantaneous sales flux of a given book results from a 
combination of external forces such as news, advertisement, selling campaign, and of social 
influences in which each past reader may impregnate other potential readers in her network 
of acquaintances with the desire to buy the book. This impact of a reader onto other readers 
is not instantaneous as people react at a variety of time scales. The time delays capture the 
time interval between social encounters, the maturation of the decision process which can be 
influenced by mood, sentiments, and many other factors and the availability and capacity to 
implement the decision. We postulate that this latency can be described by a memory kernel 
4>(t — ti) giving the probability that a buy initiated at time U leads to another buy at a later 
time t by another person in direct contact with the first buying individual. We consider the 
memory function 0(t — ti) as a fundamental macroscopic description of how long it takes for a 
human to be triggered into action, following the interaction with an already active human. 0(t) 
is normalized such that (pit) = 1. Starting from an initial buyer (the "mother" buyer) who 
notices the book (either from exogenous news or by chance), she may trigger buying by first- 
generation "daughters," who themselves propagate the buying drive to their own friends, who 
become second-generation buyers, and so on. We describe the sum of all buys by a conditional 
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Poisson branching process with intensity: 

m=r]{t)+ ^ fi^(p{t-ti) , (3) 

i\ti<t 

where rjit) is the rate of buys initiated spontaneously (for instance by Ustening to a media 
coverage of a book or serendipity) without influence from other previous buyers and the mark 
/ij is the number of potential buyers influenced by the buyer i who bought earlier at time tj. 

Our model is based on the key idea that the epidemic mechanism is basically the same for 
all books. Of course, the underlying networks of connected potential buyers are a priori not 
the same for different books. This can be accounted for by different values of the "branching 
ratio" as deflned below. 

While this version of the epidemic model of sales treats each book independently, in reality, 
we should consider correlations between sales of different books which may be related by a 
common growth of interest (see for instance the case of books on flnancial markets whose sales 
grow concomitantly during stock market bubbles Q). In addition, sales will exhibit some 
correlation at special epochs, such as Christmas. At this period of the year, all books that 
have a gift appeal will sell more copies than they would have sold otherwise. In contrast, 
if a book does not have any gift appeal, its sales ranks will fall between Thanksgiving and 
Christmas, even if its actual sales remain steady. Likewise, university students buy a huge 
number of textbooks and other required reading titles through Amazon during September and 
from mid-January through mid-February, which will depress the ranks of books that do not 
fall into this category. They even vary in a regular manner during the course of the week. 
Some titles are primarily purchased by people at work or homemakers when the kids are at 
school, while books with strong Associates support do relatively well on weekends. We will 
not take into account such effects. 

We do not specifically describe the lifetime of a book and treat the innovations 77 's as 
stationary. Assuming sales as stationary allows us to define the probability that the sales 
reach a given value. This approximation should not be too bad over the time scales of months 
of our study but fails over longer time scales. 
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C. Mean field solution 



Equation (jH)) can be written 

X{t) = r]{t) + I I N[dT X dij](j){t - r) (4) 



where N[dT x dfi] is the standard notation for the number of events that occurred between r 
and T + dr with mark between /x and /i + rf/i. In the physicist's notation /*/ N[dT x dfi] = 
I ^{t ~ ti) ^if^ ~ f^i)- The lower bound of the integral over time in ^ is for instance the 
edition time of the considered book. Taking the ensemble average of (jSI) gives 

S{t) ^ (A(t)) = v{t) + nf dr 0(t - r)5(r), (5) 

J —oo 

where the so-called "branching ratio" n = (/i) is the average number of buys triggered by any 
"mother" within her acquaintance network. We have use the fact that (/ N[dTxdfi]) = n(A(r)). 
The branching ratio n depends on the network topology as well on the social behavior of 
influences. We consider only the sub-critical regime n < 1 in order to ensure stationarity. The 
linear structure of equation ^ does not mean that the dynamic is linear. It is an effective 
coarse-grained description of the complex nonlinear dynamics. 

In order to solve S{t), it is convenient to introduce the Green function or "dressed prop- 
agator" K{t) deflned as the solution of Q for the case where the source term ri{t) is a delta 
function centered at the origin of time : 

K{t) = 8{t)+nf dr (f){t -t)k{t), (6) 

J —oo 

and by definition of K{t) : 

S{t) = f dr 7]{t) K{t-T). (7) 

J ~oo 

The cumulative effect of all the possible branching paths gives rise to the net sales flux K{t) 
triggered by the initial event at time t = 0. The response function n{t) can easily be obtained 
by taking the Laplace transform of 

1 - n(t)[l3) 

Setting /3 = 0, we get 

dr«:(r) = ^, (9) 
10 



which means that is the average number of buyers influenced by one buyer through any 
possible lineage. This result can be recovered directly from the following argument: if n is 
the average number of buyers influenced directly by one buyer, the total number of buyers 
influenced through any possible lineage is Z^^Lo'^'^ ~ 13^ ■ The term ^^dt is the probability 
that a purchase triggered by a buy at t = occurs at time t within dt. 

We consider the case where the "bare propagator" is ~ with < 6' < 1 corre- 

sponding to a long memory process. It leads to : 

K{t) ~ l/t^~^, for t < t* (10) 

K{t) ~ l/ti+^ for t > t\ (11) 

with 

t'^ocl/(l-n)^/^ (12) 



D. Prediction of the model 



1. Distribution of sales 



Starting from the evidence that the distribution p{S) of sales is a power law ((21), the linear 
expression ^ implies that the source terms rj{T) in ^ are also distributed as a power law with 
the same exponent (i. Actually, the generalized central limit theorem applied to Q implies 
that the pdf ^(5) is a stable Levy law with index jj, as soon as the source terms //(r) in 
are independently and identically distributed as a power law with an exponent < 2. By the 
generalized central limit theorem, the characteristic function of S is given by 



) = exp 



-D\u\ 



(13) 



(14) 



where Z) is a measure of the magnitude of the sources //(t). This translates into 

In reality, there is probably a dependence between the source terms r]{T). However, as long 
as their correlation function decays faster than 1/t, this has only the effect of changing the 
coefficient D. We do not explore the situation in which the r7(r)'s could have longer range 
correlations. 
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2. Dynamical properties 



a. Exogenous shock. An external shock occurring at t = can be modeled as a jump 
So6{t). The response of the system for t > is then : 

S{t) = SoK{t)+ f v{r)<t-T). 

J — CO 

The expectation of the response to an exogenous shock is thus : 

E[S{t)] = SoK{t) + j^{v), (15) 

where {r]) is the average source level. Expression ^Tf^ simply expresses that the recovery of 
the system to an external shock is entirely controlled by its relaxation kernel. 
For an external shock which is strong enough, for t > : 

Eexo(^(t)) ~ /€(t). (16) 

It exemplifies that ^(t) is the Green function of the coarse-grained equation of motions of the 
system. 

b. Endogenous shock. Consider a realization which exhibits a large sales burst S(t = 0) = 



5*0 without any large external shock, 
set of realizations of the noise {ri{t)} 
a zero mean. Using this, we get : 



n this large endogenous shock requires a special 

. We can write : r] = rj + {rj). By construction fj has 
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S{t) = / dT7]{T)K{t - r) + / dT{r])K{t - r) 

J —oo J — oo 

+ / drijMKit — t) 
Jo 

The expectation of is : 

E[S{t)] = f drE [^{r)] ^{t - r) + ^ . (17) 
J —CO i — n 

As for an exogenous shock, the constant equal to the unconditional average (S'), can be 
neglected. 

For r < 0, the expectation of fj^r) is not zero, because the value S{0) = Sq is specified. 
In contrast, for r > 0, E[r7(r)] = (rj) since the conditioning does not operate after the shock. 
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Consider the process W{t) = J'Loo^^'T vi'^)- A standard result is that for t<0: 



E 



W{t)\S{0) = So 



iSo-E[S]). 
(So-m) 



CoY[W{t),So] 
Var[5o] 

dr k(—t) . 



This expression predicts that the expected path of the continuous innovation flow prior to the 
endogenous shock (i.e. for t < 0) grows like AW{t) ~ K,{—t)At upon the approach to the 
time t = of the large endogenous shock. In other words, conditioned on the observation of 
a large endogenous shock, there are specific sets of the innovation fiow that led to it. These 
conditional innovation fiows have an expectation Fj[ri(t < 0)] — {i]) ~ S'o/t(— t) (We assume 
So > E[S]). We thus obtain from (TTtIi for t > n and t < : 

Eendo[^(t)] (X So / dTKit - t) /€(r) . (18) 

J —oo 

c. Distinguishing both shocks The model predicts two different relaxations for exogenous 
and endogenous shocks according to expressions (fT6|l and (fT8|) . Assuming that we are close 
to the critical point n ~ 1, we can use k{t) ~ 1/t^^^. Table ITTl gathers the aftershock and 
foreshock signatures. 





Endo 


Exo 


Aftershock 


S{t) oc ^^l^e 


Sit) oc ^ 


Foreshock 


S(t) oc |^|i_2fl 


Abrupt Peak 



TABLE II: Aftershock and foreshock signatures for endogenous and exogenous shocks occurring at 
time t = 0. 



The prediction that the relaxation following an exogenous shock should happen faster (ex- 
ponent 1 — ^) than for an endogenous shock (exponent 1 — 29) agrees with the intuition that 
an endogenous shock should have impregnated the network much more and thus have a longer 
lived infiuence. This result is a non-trivial consequence of our model. If we perform the same 
calculation Q for an exponential decaying memory kernel 4>{t), the functional form of the 
recovery does not allow one to distinguish between an endogenous and an exogenous shock. 
For the memory kernel (j){t) decaying faster than an exponential, the endogenous relaxation 
turns out to be faster than the exogenous one 
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IV. EMPIRICAL DETERMINATION OF THE EXPONENTS OF SALES DYNAM- 
ICS 



A. Selection of peaks and fitting the power law 

We qualify a peak as a local maximum over a 3-month time window which is at least 
k = 2.5 time larger than the average of the time series over the same 3-month time window. 
The threshold value k was determined after looking at several examples of time series. The 
results do not change significantly by varying k within large bounds (see below). 

We considered only the sales maxima corresponding to ranks reaching the top 50. 

We selected those time series which had at least 15 days after the peak, so that we can 
analyze the recovery signature following shocks. 

We fit the sales dynamics by a power law : 

A 



S{t) 



with A, p and tc unknown. The "critical time" tc is expected to be close to the time to of the 
peak. We know from previous experience in critical phenomena that the determination of the 
exponent p can be quite sensitive to the fitted value of tc. 

Moreover, we do not know a priori which window size should be taken. If the signal were a 
pure power law, it would not matter. But, here the signal is approximately a power law over 
a finite range. A big time window means more data and so more accurate results but we have 
to end the window before a possible change of regime due to (i) < 1 implying the existence 
of the finite cross-over time t* given by (fT2|l . (ii) S{t) tending to background noise around to 
its average value and (iii) the impact of other shocks that may interfere. 

In order to determine A, p and tc for a given window size, we use a least square method, 
i.e. we seek to minimize the following quantity : 

(^logA,p,tc = FlogA,p,tc{^i) ; 
ti 

with, 

FiogA,p,tc(ti) = \og S(ti) -\ogA + p logiti - tc). 

As a is quadratic with respect to log A and p, for these two parameters, the minimization of 
a is straightforward and can be done analytically. Setting the partial derivatives j^^^ and 
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equal to zero, we just need to solve a linear system of two equations with two unknowns, which 
gives log A{tc) and pt^ as a function of the still unknown t^. Now, we just need to minimize 

(19) 

a function of one variable. Fig. El shows an example of a{tc). To minimize such irregular 
function, we scan over all the value of tc- We typically use a total interval of one week around 
the time of the peak. 

We perform a fit for different time windows ranging from a lower bound Imin to an upper 
bound Imax- Imin has a fixed value, set to 15 days. In contrast, Imax depends on the time 
series. I^ax is calculated as the time at which the minimum of the sales occurs, over a time 
window running from 25 days to 6 months. Imax is fixed in this way to prevent a rise in sales 
from being taken into account in the fit. 

Once the fit has been performed for the different time windows, we look at the correlation 
coefficients of the fits and choose the window which leads to the best correlation coefficient. 
This method turns out to give a robust estimate of the power law decay of the relaxation of 
sales. 

B. Distribution of exponents 

Out of some 14, 000 books available on Junglescan on April 2004, our algorithm detects 
1, 013 peaks which obey the constraints of reaching the top 50, with sufficient data before and 
after the peak and obeying the condition of not being contaminated by a closeby peak, as 
specified above. Among these 1, 013 peaks, we select those followed by a relaxation which can 
be well-approximated by a power law, with the criterion that the correlation coefficient r of the 
corresponding fit is larger than 0.95. This leads us to keep 138 peaks. Making this selection 
does not change qualitatively our results but improve somewhat the quantitative findings. We 
have played with different values of the correlation coefficient between 0.8 and close to 1 and 
find the same results with larger error bars for lower correlation coefficients. 

Figini shows the distribution of power law exponents for the decrease of sales after peaks. 
We find two clusters corresponding to peaks respectively with an exponent 0.2 < p < 0.6 
and with an exponent 0.6 < p < 1. This suggests that these two clusters can be seen as the 
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endogenous cluster (1 — 26^ ^ 0.4) and the exogenous one (1 — 6 ^ 0.7). This provides a first 
estimate for the exponent 9 ~ 0.3. 

According to the epidemic model proposed here, the small values of the exponents (close 
to 1 — 6* and 1 — 29) for the exogenous and endogenous relaxations, respectively, imply that 
the sales dynamics is dominated by cascades involving high-order generations rather than by 
interactions stopping after first-generation buy triggering. Indeed, if buys were initiated mostly 
by the direct effects of news and advertisements without amplification by triggering cascades 
in the acquaintance network, the cascade model would predict an exponent 1 + 6' given by the 
"bare" memory kernel 0(t). The values smaller than 1 for the two exponents for exogenous 
and endogenous shocks imply accordingly that the average number n (the average branching 
ratio in the language of branching models) of impregnated buyers per initial buyer in the social 
epidemic model is on average very close to its critical value 1, because the renormalization from 
(pit) to n{t) given by (fTO|) only operates close to the criticality characterized by the occurrence 
of large cascades of buys. Reciprocally, a value of the exponent p larger than 1 would suggest 
that the associated social network is far from critical. 

C. Stacking the peaks 

According to our model and as summarized in table HH the peaks belonging to the cluster 
with high p (p ^ 0.7) should be in the exogenous class, and therefore should be reached 
by the occurrence of abrupt jumps without detectable precursory growth. Alternatively, the 
peaks belonging to the cluster with p ~ 0.4 should be in the endogenous class, and therefore 
should be associated with a progressive precursory power law growth l/{tc — t)^ with exponent 
p=l-29. 

To check this prediction, the following algorithm categorizes the growth of sales before each 
of the peaks according to its acceleration pattern. We differentiate between peaks that have an 
increase in sales by a factor of at least /Cexo prior to the peak and peaks that have an increase 
in sales by a factor of less than fcendo at the same time. More precisely, we compared the value 
of the sales at the time of the peak (D day) and the average value of the sales from day D-4 
to D-1. We tried several values for these two coefficients fcexo and kendo- 

We find that the bigger /Cexo, the largest is the exponent of the average relaxation for books 
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that have an increase in sales by a factor more than fcexo- Conversely, we find that the smaller 
^endo, the smaller is the exponent of the average relaxation for books that have an increase in 
sales by a factor less than /cexo- Both results agree with the intuition that, for selective criteria, 
we only keep peaks that are very easy to classify and thus reduce the probability to make a 
misclassification. 

Finally, we set fcexo = 30, fcendo = 2. It means that peaks for which the acceleration factor 
were between 2 and 30, were not considered for the subsequent analysis leading to Fig.[3 Out 
of the 138 peaks, 30 remains. 

FigEI shows the average relaxation and precursory acceleration. For shocks classified as 
exogenous according to their acceleration pattern, we find a relaxation governed by an exponent 
1 — ^ 0.7. For shocks classified as endogenous, both aftershocks and foreshocks are controlled 
by an exponent 1 — 26' ~ 0.4. These results match what has been predicted by the model (see 
Table m). 

Aftershock: the best fit of the least square method with a power law gives a slope 1 — 9^ 
0.7 for exogenous shocks and a slope 1 — 26* ~ 0.4 for endogenous shocks. One can observe a 
crossover for t — tc ~ 60 — 80 days, in both cases. It is tempting to interpret this crossover 
as the change of regime predicted by the model for t ^ t* (see section IIII Cjl . Indeed for 
t > t*, we should expect a power law with exponent 1 + 6*, both for exogenous and endogenous 
shocks. Unfortunately, the crossover does not extend sufficiently far to allow us to constrain 
the exponent of the second regime. 

Foreshock: the best fit of the least square method with a power law gives a slope 1 — 9 ^ 0.4 
for the endogenous foreshocks. The time on the x-axis has been reversed to compare the 
precursory acceleration with the aftershock relaxation. The superposition of the two top curves 
for the precursory and relaxation behavior of the endogenous peaks confirms the symmetric 
behavior predicted by the model (see Table ITTjl . 

D. Detailed analysis of exogenous peaks 

Table ITTTl lists the 10 peaks that have an exponent larger than 0.65 among the thirty peaks 
used to make Fig. El Fig. [HI shows the evolution of the sales for these 10 peaks, from ten days 
before to 70 days after the peaks. Fig. |H1 shows the time evolution of the sales of the book 
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"Get with the program" written by the personal trainer of Ophra Winfrey and its remarkable 
succession of exogenous peaks associated with regular appearance of the book in Oprah's TV 
show. 

In Fig. ini eight peaks are reached by a fast acceleration of the sales, as expected from our 
model and the classification. However for two books, "Star Wars" and Stephen King's novel, 
the situation is different. Their acceleration patterns are classified as endogenous (i.e. slow 
acceleration growth) by our algorithm. Yet, they exhibit a fast relaxation (large exponent 
p = 1 — 6) which means that they are classified as exogenous according to the first criterion, 
based on the exponent of the relaxation after the peak. So, what's wrong ? 

Consider the example of the book entitled "Stars Wars." Its success was triggered by the 
release of the movie. But, as the advertisement campaign lasted several weeks, not to say 
several months, no wonder we don't observed a jump but a slow acceleration growth. For such 
huge selling campaign, modeling the source term ri{t) responsible for the shock by a Dirac 
function is a very poor approximation. Here, the time duration of the advertisement campaign 
cannot be neglected. 

For Stephen King's novel, the situation looks quite the same. On September 2003, the Board 
of Directors of the National Book Foundation announced that its 2003 Medal for Distinguished 
Contribution to American Letters would be conferred to Stephen King. This is America's most 
prestigious literary prize. The ceremony took place two months later, on 11/19/2003, shortly 
after the observed peak. As for "Star Wars," the time extension of the exogenous impact of 
news cannot be neglected. 

This finding suggests that modeling external influence -news, advertisement- by a Dirac 
function S(t) is not adequate in some cases. The external shock may have a significant duration 
T. We then expect the sales to grow slowly and exhibit a plateau over a time scale proportional 
to T and then to crossover to the exogenous decay rate for times t > T. Fig. ITOl shows 

that this is indeed the case for the two books "Star Wars" and "Wolves of the Calla" which 
were exceptions to our classification in terms of their acceleration pattern: the long durations 
of the peaks are consistent with the fact that the external news impacted over an extended 
period of time, leading to our misclassification as endogenous with respect to their acceleration 
before the peak. The time scale of about 20 days of the plateau is consistent with the known 
duration of the external news. 
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These two examples show the need to refine our analysis to allow for a more general descrip- 
tion of external news. In particular, the epidemic model can in principle be used to invert the 
amplitude of the flow of news ?7(r) from the time series of the sales S(t). But to be effective 
and reliable, this would require better data, for instance directly working on Amazon sales 
rather than on reconstructed sales from ranks. 

E. Robustness of the results upon variation of the conditions of peak selection 

Previously, we kept 30 peaks corresponding to those which give a flt with a power law with 
a correlation coefficient larger than 0.95 and which have an increase before the peak of a factor 
less than kendo = 2 or of a factor more than kexo = 30. It is interesting to discuss the results 
for less restrictive conditions. 

Fig. HI] is similar to Fig. El and shows the distribution of power law exponents for the 
decrease of sales after peaks for those peaks whose relaxation can be fitted by a power law 
with a correlation coefficient larger than 0.9. This less restrictive condition on the quality of 
the power law fit selects 388 peaks out of the initial 1013 peaks of our prefiltered data set, i.e., 
close to three times more peaks. We again find two rather clearly defined clusters, which can 
be associated with the endogenous and the exogenous classes as previously. 

Among the 388 peaks that have a correlation coefficient more than 0.9, we kept the 270 peaks 
which have an increase before the peak less than kendo = 8.5 or more than kg^o = 12.5. These 
two coefficients were selected empirically, such that the algorithm make a distinction between 
slow and fast acceleration similar to what common sense would suggest. The 388 — 270 = 118 
rejected peaks correspond to blurred situations where the acceleration of the sales is neither 
fast nor progressive. Fig. [T21 is similar to Fig. U\ but for these 270 peaks. We again obtain 
two different power law slopes, a faster decrease and larger exponent for the class classified 
as exogenous with respect to its fast foreshock acceleration than for the class classified as 
endogenous with respect to its progressive foreshock acceleration. For (A;en(io,^exo) = (8.5,12.5), 
we obtain p = 0.54 for the exogenous class and p = 0.4 for the endogenous class. To test 
the sensitivity with respect to the coefficients kendo and kexo, we report other values. For 
(^en(io,fceao) = (5,20) (respectively (2,30)) which are not shown, we obtain a mean exponent 
for the exogenous class equal to 0.55 (resp. 0.57) and 0.39 (resp. 0.39) for the endogenous 
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class. While the results are qualitatively consistent with those obtained with more stringent 
conditions, we see that the exponent for the exogenous class is a bit too small. This may be 
due to the duration of the news which are not instantaneous as discussed above and to the 
existence of other factors and disturbances not described here. 

V. FURTHER TESTS AND CONSTRAINTS ON THE RANK-SALE CONVERSION 

m 

In this section, we compare the prediction of the model with the data on changes of ranks 
to test the validity of the power law distribution Q that we used to convert ranks into sales. 

Let us denote AR{t) = R{t + At) — R(t) the variation of rank over the time interval At 
(which will be taken fixed and equal to 1 day). Let us call AS{t) = S(t + At) — S{t), the 
variation of sales over the same time interval. It is clear that variations of ranks must be 
interpreted relative to past ranks. A change of a few ranks when a book is selling a rank 10 
is not the same as when it has rank 10,000. This motivates us to study the conditional rank 
variation defined by 



defined as the rank variation from time t to t + 1 for a book which was at rank R(t) at time 
t. Similarly, the conditional sale variation 



is the expectation of the variation of the sales conditioned on its value before. 

First, we will derive {AS{S)) from our model. Then, using the postulated rank-sales conver- 
sion given by Q will provide a prediction for {AR{R)). We will also measure {AR{R)) directly 
without using the conversion rank-sales, and compare to test the ranks-sales conversion. 



{AR{R))^E[ARit)\R{t)] , 



(20) 



{AS{S))^E[ASit)\S{t)] 



(21) 
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A. Prediction of the model of section II I II 

1. Derivation of {AS (S)) 
Let us define 

S{t) = f dTfl{T)K{t-T), (22) 

with fj = r]-{r]). Then, S{t) = S{t) + ^ and AS{t) = AS{t). Let us expand AS{t) : 

AS{t) =AS{t) 

= f dTri{T){K{t -r + At)- K{t - r)) 

rt+At 

+ J dTfj{T)K{t + At - t). 

Conditioned on the value of S{t), the expectation of the second term of the r.h.s is zero because 
the conditioning does not affect times posterior to t. This implies 

(A^(^)) = /* drE [^{T)\S{t)] {K{t - r + At) - K{t - r)) . (23) 

Following the same reasoning as in section ITlI D 21 we have VS'(t), E[r/(r)|S'(t)] oc S{t)K{t — T). 
Replacing this expression in (f23|) obtains 

{AS{S)) = -aS (24) 
= -a{S-{S)), (25) 

with 

a ~ - /* dr(K(t - r + At) - ^(t - r)) , (26) 
which is positive because k is a decreasing function. 

2. Derivation of {AR{R)) from {AS{S)) assuming |^ to be valid 

Expression Q implies that the function R{S) is a power law of exponent 
the logarithmic derivative expressed as a finite diff'erence gives 

AR AS 



— /i. Thus, taking 



(27) 
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From dSHI) and we get 

=/ia(l-— ) (28) 

S can be expressed as a function of R as follows. Starting from we express the constant 
C as C = jJ'S'^i^ from the normalization of ^(5*) in the interval from a minimum sale Smin to 
infinity. Then, (S) = j p{S) S dS = Yi^Smin- We have assumed /i > 1, which implies that a 
majority of the sales are coming from books with large ranks (i.e., low sales) and not from the 
few blockbusters in the top ranks. This leads to 

R{S) = iVP>(5) = ^(^)'(^)' ' (29) 

where N is the 'total' number of books. It is not very clear which value should be taken for 
N. Amazon sells and ranks several millions of books. But, should our power-law conversion 
be true, it won't probably be valid for the largest ranks in the million range. We can probably 
expect to have ~ 10'' because Amazon's change of ranking scheme around R = 10'^, creating 
a natural population of the first 10,000 ranked books. Or perhaps, it could be 10^. 
Putting together ^ and ^ leads to 



{AR{R)) = a^iR 



^ I R 



(30) 



fi-l\Nj 

Our epidemic model of book buys together with the assumption of a power law distribution 
of sales with exponent /i, expression (|r?n|l predicts that the average variation of ranks is pro- 
portional to the rank itself for small R and to —R for large rank. The non-monotonicity 
of {AR{R)), i.e., {AR{R)) > for small R and {AR{R)) < for large R simply refiects that 
best-sellers tend to lose ranks because being a best-seller requires to have continuously sources 
ri{t)'s of buyers above the average, which can only be achieved for a relatively short time. 
Conversely, poorly ranked books can only improve their ranking on average. 



B. Empirical test 

FigfT3lshows the overall behavior of {AR{R)). For our purpose, we will ignore ranks R > 10^ 
because, for such ranks, the Amazon ranking scheme is not appropriate as already discussed 
in section ITTbI and illustrated by the artifacts for R ~ 10"^ and R ~ 10^. These spurious peaks 
reflect the shift of Amazon between different ranking schemes. 
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The part of FigUHl for R < 10"^ is magnified in FigHH The predicted non-monotonous 
behavior is observed but expression (j3n| do not fit the experimental data for the small ranks. 
The log-log plot of FigHU clearly shows that ( Ai?) is given by 

(Ai?)~i?^, with /? = 1.5 ± 0.05 , (31) 

over approximately three decades. Thus, (Ai?) is not proportional to R since (3 is significantly 
larger than 1. This departure from linearity can either signal a problem with the prediction 
(f25|l of the model or can be due to a breakdown of the power law assumption 

1. First option: breakdown o/ E[ry(r)|S'(t)] oc S{t) 
Suppose instead of (f25|l that 

(AS(5)) ~ -5" , (32) 

where x may be different from 1. Then, following step by step the derivation leading to (f30|l . 
we obtain 

{AR{R)) ~ R^ R , (33) 

where we have used the power law conversion (|H) between sales and ranks derived from 
When (f25|l holds, x = 1 and expression p3|) recovers the linear dependence of {AR{R)) with 
R, for not too large R, quantified by the first term in the r.h.s. of (f30|) . Now, expression (|33| 
is compatible with the data (f3T| together with the measurement /i ^ 2 only if we take x = 0, 
which implies that {AS{S)) does not depend in a first approximation on S. This in turn implies 
that the result E[ry(r)|5'(t)] oc S(t)K{t — r) obtained above following the reasoning of section 
nil D 21 does not hold and must be replaced by the statement that E[r/(r)|S'(t)] is independent 
of S(t). Note that the dependence of E[r/(r)|S'(t)] on K(t — r) is not necessarily linked to 
the validity of E[r/(r)|S'(t)] oc S(t) and thus the proportionality E[?y(r)|S'(t)] oc K(t — r) can 
still hold, ensuring the validity of the exponent 1 — 26' for the foreshock and aftershock sales 
associated with an endogenous peaks (see Table HI]). 

A possible interpretation of the breakdown of E[r7(r)|5'(t)] oc S{t) is that the amplitude 
of the sales have a large stochasticity from book to book and the dependence of precursory 
innovations on the future amplitude of the sales's peak is lost. In sum, if we accept the 
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conclusion drawn in insight that the sales' innovations are weakly dependent or are actually 
independent of the amplitude of the sales' peak, then the observed law (f3T| can be seen as 
a dramatic confirmation of the power law ^ through the rank-to-sale power law conversion 
(IH) in the range of ranks up to a few thousands. However, as a note of caution, we cannot 
exclude the possibility that x ^ and /i 7^ 2 as long as (1 — = 1/2 holds: in such a case, 
expression (f33|) shows that we would recover the observed power law relationship (f3T| from 
the analogous derivation that led to (j3n|) . 



2. Second option: deviations from the power law conversion 

Let us consider a general function S(R) relating ranks to sales. Equation (f27jl becomes 
AS* = S'{R)AR. Using the empirical evidence (Ai?) oc i?^ and assuming still the validity of 
(AS*) oc —S for small R, we get S{R) = 5'mm exp(-^|^), where Smin and C are two constants. 
Assuming that this expression for small R can be used for all ranks of interest, we get : 



aRf^ 

{AR{R)) 



CiP-l) 



<s> ( c 

■exp 



(34) 



'-'mm \ , 

The two constants Smin and C can be adjusted to provide a rather good fit by to the 
data, as shown in Figure [TH By construction, (Ai?) has the correct behavior for small R. All 
the difficulty is to describe how {AR) reaches its maximum and then decreases. Typically, the 
fit gives C = 165, which leads to -f^ = exp(165) ~ 10™. Obviously, this is totally absurd 
! Our mistake was to derive S{R) for small R in order to use it to calculate the behavior of 
{AR) for any R. 

A better approach is to invert the empirical function {AR{R)) to get S{R), without making 
assumptions about it. For a given function S{R), equation (f28|) becomes 

S'{R){AR{R)) = -a{S{R) - (S) . (35) 

Thus, we can express S{R) as a function of (AR) and obtain 

S{R) = (S) + (g^a. - {S))exp(^-aJ^'' {ARiR')) "^^) ' ^^^^ 
where Smax is a constant equal to S{R = 1). 
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The problem to use (f36| is that we do not know the value of the constant a, which is rather 
crucial as it appears in the exponential. Nevertheless, whatever the value of a, we observe 
roughly the same behavior for S{R). FigfTKl shows S{R) obtained by inserting the empirical 
dependence {AR{K)) as a function of R in expression (jHHl) for fixed values of a, Smax and (S). 
The "fast" decrease for small R (typically i? < 10 — 20) followed by a "slow" one for large i?'s 
is typical of most reasonable parameters. This shows that the conversion can be roughly seen 
as a power law (an approximate straight line in a log-log plot) in the range R G [20, 10^]. But 
we are not able to determine the value of the exponent from this approach because we don't 
know a. 

In summary, the direct determination of the function S{R) requires some additional as- 
sumption as shown in the various attempts developed above. Nevertheless, we have seen that 
a power law would explain the observed non-monotonous behavior for {AR{R)). Notice that 
if we assume S{R) to be exponential : S{R) = Soexp{—R/R^), the relation (j35|l implies : 



which is obviously monotonous and can thus be rejected by comparison with the data. 
VI. CONCLUSION 

In our study of the ranks of books sold by Amazon.com, we have shown that sales shocks can 
be classified into two categories: endogenous and exogenous. We have used two independent 
ways of classifying peaks, one based on the acceleration pattern of sales (see section ITV B|l and 
the other based on the exponent of the relaxation (see section ITV Cj) . We have developed an 
epidemic model of infiuences between buyers connected within a network of acquaintances. 
The comparison between the predictions of the model and the empirical data suggests that 
social networks have evolved to converge very close to criticality (here in the sense of critical 
branching processes). It means that tiny perturbations, which in any other state would be felt 
only locally, can propagate almost without any bound. Studies of critical phenomena shows 
that very different systems can exhibit fundamental similarities, this is generally referred as 
universality. 

While we have emphasized the distinction between exogenous and endogenous peaks to set 




(37) 
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the fundamentals for a general study, we also find closely repeating peaks as well as peaks that 
may not be pure members of a single class. In a sense, there are no real "endogenous peaks", 
one could argue, because there is always a source or a string of news impacting on the network 
of buyers. We have thus distinguished between two extremes, the very large news impact and 
the structureless fiow of small news amplified by the cascade effect within the network. One 
can imagine and actually observe a continuum between these two extremes, with feedbacks 
between the development of endogenous peaks and the attraction of interest of the media as a 
consequence, feeding back and providing a kind of exogenous boost, and so on. Our framework 
allows us to generalize beyond these two classes and to predict the sales dynamics as a function 
of an arbitrary set of external sources. 

If Amazon.com would release its data, we suggest future promising directions of inquiries. 

• Before anything else, the same study should be done again. We expect more accurate 
results as we work with the real sales and not only with an estimate derived from the 
ranks. 

• Secondly, a direct access to sales should make it possible to reverse S{t) to get access 
to the news innovations ri{t) describing the sources of spontaneous buys. We currently 
describe ri{t) as a white noise distributed according to a power law. This zero order 
approximation could be improved and leads to a better understanding of the statistical 
properties of the news 77 (t). One can even imagine to reconstruct the specific history of 
news that were amplified by the epidemic process to obtain a given sales history for a 
given book. 

• Different kinds of books should involve different social networks. Take the example of the 
book "Divine Secrets of the Ya-Ya Sisterhood" by R. Wells. It became a bestseller two 
years after publication, with no major advertising campaign [l^- Following the reading 
of this originally small budget book, "Women began forming Ya-Ya Sisterhood groups of 
their own [...]. The word about Ya-Ya was spreading [...] from reading group to reading 
group, from Ya-Ya group to Ya-Ya group" (l^. By looking at different classes of books, 
we would expect to highlight different network characteristics jlfll. 

• Amazon has the addresses of its customers. This can be used to study the geographical 
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spread of books. 
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Title 



Author 



Time 



Ran] [Reason 



Get with the Program 



Bob Greene 



3/16/2002 



1st 



The book was seen on 
the Oprah Winfrey Show 
(B.Greene is O. Winter's 
trainer) . 



Get with the Program Daily 
Journal 



Bob Greene 



10/19/20021 
1/4/2003 
10/19/2002 



1st 
1st 
2nd 



Adios Muchachos 



Sacred Contracts : Awakening 
Your Divine Potential 
Micawber's Museum of Art 
Refrigerator Rights : Creat- 
ing Connections and Restor- 
ing Relationships 
Stars Wars : Episode II, At- 
tack of the Clones 
Wolves of the Calla (The Dark 
Tower, Book 5) 



Daniel 
Chavarria 

Caroline Myss 

John Lithgow 
Dr.Will Miller 



7/5/2002 



6th 



8/16/2002 

9/30/2002 
1/26/2003 



1st 



3rd 
12nd 



The book was seen on 
the Oprah Winfrey Show 
(B.Greene is O. Winter's 
trainer) . 

On 7/4/2002, the book was on 
the first page of the Art Sec- 
tion of the New York Times 

? 



RA Salvatore 



Stephen King 



5/20/2002 



13rd 



11/11/2003 3rd 



The author has appeared on 
a variety of national television 
programs 

The movie was released on 
5/16/2002. 

On 11/19, the 2003 Medal for 
Distinguished Contribution to 
American Letters was con- 
ferred upon Stephen King. 



TABLE III: List of the 10 peaks that have an exponent p more than 0.65 (see FigJSJ. The last column 
suggests a possible cause of the exogenous shock, as far as we have been able to tell. 
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In October 2004, Amazon completely redesigned its ranking system. See 
[http:/ /www.fonerbooks. com/surfing. htm| for details. Not only the meaning of the ranks 
have changed, due to the inclusion of e-books and Marketplace sales, but also the infrastructure 
of the system has been modified: the three tiers are gone, all ranks are updated every cycle, 
which runs around once an hour. It is thus no more possible to compare the ranks before to 
those after October 2004. It is important to realize that Amazon's ranking system may again 
evolve on a moment's notice. 
[16] We indeed found a strong spectral peak at period 1 day on the spectrum of the time series of 
book ranks. 
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The sales are falling off 



The book doesn't sell anymore. 

The final spot, extremely stable, is based 
on the total number of copies sold over 
a long time scale. 



long time scale ranking schemes 
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FIG. 1: Two time series of ranks showing that Amazon switches between at least two ranking schemes, 
as explained in the text. The first book (top) is "See No Evil" by R. Baer. The second book (bottom) 
is "F'd Companies" by P. Kaplan. 
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FIG. 2: Rank ordering (Zipf) plot of the sales 5 per day, as a function of rank R. For R in the range 
from 10 to 10, 000, the sales as a function of rank can be fitted with a good approximation by the 
power law S{R) ^ 1/R^/^^ with exponent /x = 2.0 ±0.1, as shown by the straight line with slope —0.5. 
This translates into ii ~ 1/5^, which is proportional to the complementary cumulative distribution of 
sales. Note that the bend at large ranks (small sales) can be considered to be a finite size effect. The 
tail of the distribution of sales for large sales is described by the low ranks for which the distribution 
does not appear to bend but actually exhibits huge fluctuations as shown in figure IHl borrowed from 
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SALES RANK 




FIG. 3: Rank ordering (Zipf plot) as in Fig. 12 presented in jq], extended to the smallest as well as 
largest ranks. See these two extensions. Reproduced with kind permission from M. Rosenthal. 
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Example of an exogenous shock 

10' r 1 , , , 1 
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FIG. 4: Time evolution over a year of the sales per day of two books : Book A (top) is "Strong 
Women Stay Young" by Dr. M. Nelson and Book B (bottom) is "Heaven and Earth (Three Sisters 
Island Trilogy)" by N.Roberts. The difference in the patterns is striking, Book A (resp. B) exhibiting 
an exogenous (resp. endogenous) peak. 



33 



14 




gi , , , , , 1 

-3-2-10123 

tc-*o (days) 

FIG. 5: cj(ic) defined in (HH function of tr. 
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Value of the power law exponent p for aftershocks 



FIG. 6: Histogram of the estimated power law exponents p of the relaxations of the sales. One 
can clearly identify two clusters: the endogenous cluster with exponent 1 — 26* close to 0.4 and the 
exogenous cluster with exponent 1 — 9 close to 0.7, compatible with the estimation 9 « 0.3. The peaks 
shown here are those used in Fig. d (see section ITV C^ . 
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FIG. 7: Relaxation after the peak (for both endogenous and exogenous cases) and precursory accel- 
eration (for the endogenous case). We average (log average) over the peaks classified as endogenous 
and exogenous according to their precursory growth. The same sample of books was used for both 
FigEland Figd 
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FIG. 8: Time evolution of the book "Get with the Program" (see Table UTTjl ". Each time the book 
is presented in Oprah Winfrey Show, the sales jumps overnight and then relaxes according to the 
exogenous response function K{t) l/t^^^ . 
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FIG. 9: Time evolution of the sales of the 10 books of Table UTTI For each time series, t = is the 
time of the peak. The sales jumped just before the peak except for "Stars Wars" and "Wolves of the 
Calla." 
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FIG. 10: Relaxation of the sales after the peak for "Star Wars" and "Wolves of the Calla." The 
expected power law behavior for exogenous shocks can only be observed for t > 10 — 20 days. The 
small value of the slope for t < 10 — 20 days can be explained by considering the time duration of the 
external shock triggered by the media as explained in the text. 
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FIG. 11: Histogram of the estimates of the power law exponents p of the relaxations of the sales using 
a selection criterion different from that of FigEJ among the 1,013 peaks of our pre-filtered database, 
we kept those which have a correlation coefficient larger than 0.9, giving 388 peaks. 



40 



10" 



o°°oOooo^ 



OOOOo 



ooo 



OOOO 



OOo 



- 

-I— » 

Si. 
>■< 

Q 

-10"' 
(7) 



10" 



10" 



°OOoOo, 



OOr 



>I>, 



'oo, 



Oooor 



>>!>[>>> 



'OOo, 



Oo, 



l>l>, 



Oor 



I>>i 



i> Exogenous (Aftershock), 1-6=0.54 
o Endogenous (Aftershock), 1-20=0.4 



|t-tj (days) 



10' 



10^ 



FIG. 12: Relaxation of sales after the peak following the same methodology as for Figdbut with the 
same set as in Fig. ^2 with the clustering procedure explained in the text. 
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FIG. 13: Average variation {AR) of the rank in function of the rank itself. This measure was performed 
over a sample of 14000 books. One can note spurious behaviors for ranks close to 10^ and 10^. This 
can be rationalized by a shift in ranking schemes for these value (see section III B 1|1 . The data can only 
be exploited for R < 10^ to avoid these artifacts. The most striking feature is the non-monotonous 
behavior of {AR) for R < 10^. This enables us to reject an exponential conversion (see section FV B 2p . 
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FIG. 14: Average variation of the rank (Ai?) as a function of the rank itself. We only take into account 
those time series that do not exhibit abrupt peaks (i.e., we discard exogenous shocks). 

plus : experimental data, solid line : fit assuming S{K) to be a power law, dashed line : fit 

assuming S{R) to be S{K) oc exp(-^) 
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FIG. 15: Rank-to-Sales conversion function S{R) estimated using expression l(36|l for arbitrary valu 
of a, (S), Smax- Consequently, the units on the y-axis are arbitrary. 
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